257 research outputs found

    No more hidden solutions in bioinformatics.

    Get PDF

    Variation ontology: annotator guide.

    Get PDF
    Systematic representation of information related to genetic and non-genetic variations is required to allow large scale studies, data mining and data integration, and to make it possible to reveal novel relationships between genotype and phenotype. Although lots of variation data is available it is often difficult to use due to lack of systematics

    Solubility of proteins

    Get PDF
    Solubility is a fundamental protein property that has important connotations for therapeutics and use in diagnosis. Solubility of many proteins is low and affect heterologous overexpression of proteins, formulation of products and their stability. Two processes are related to soluble and solid phase relations. Solubility refers to the process where proteins have correctly folded structure, whereas aggregation is related to the formation of fibrils, oligomers or amorphous particles. Both processes are related to some diseases. Amyloid fibril formation is one of the characteristic features in several neurodegenerative diseases, but it is related to many other diseases, including cancers. Severe complex V deficiency and cataract are examples of diseases due to reduced protein solubility. Methods and approaches are described for prediction of protein solubility and aggregation, as well as predictions of consequences of amino acid substitutions. Finally, protein engineering solutions are discussed. Protein solubility can be increased, although such alterations are relatively rare and can lead to trade-off with some other properties. The aggregation prediction methods mainly aim to detect aggregation-prone sequence patches and then making them more soluble. The solubility predictors utilize a wide spectrum of features.</p

    Types and effects of protein variations.

    Get PDF
    Variations in proteins have very large number of diverse effects affecting sequence, structure, stability, interactions, activity, abundance and other properties. Although protein-coding exons cover just over 1 % of the human genome they harbor an disproportionately large portion of disease-causing variants. Variation ontology (VariO) has been developed for annotation and description of variation effects, mechanisms and consequences. A holistic view for variations in proteins is made available along with examples of real cases. Protein variants can be of genetic origin or emerge at protein level. Systematic names are provided for all variation types, a more detailed description can be made by explaining changes to protein function, structure and properties. Examples are provided for the effects and mechanisms, usually in relation to human diseases. In addition, the examples are selected so that protein 3D structural changes, when relevant, are included and visualized. Here, systematics is described for protein variants based on VariO. It will benefit the unequivocal description of variations and their effects and further reuse and integration of data from different sources

    Spectrum of disease-causing mutations in protein secondary structures

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Most genetic disorders are linked to missense mutations as even minor changes in the size or properties of an amino acid can alter or prevent the function of the protein. Further, the effect of a mutation is also dependent on the sequence and structure context of the alteration.</p> <p>Results</p> <p>We investigated the spectrum of disease-causing missense mutations in secondary structure elements in proteins with numerous known mutations and for which an experimentally defined three-dimensional structure is available. We obtained a comprehensive map of the differences in mutation frequencies, location and contact energies, and the changes in residue volume and charge – both in the mutated (original) amino acids and in the mutant amino acids in the different secondary structure types. We collected information for 44 different proteins involved in a large number of diseases. The studied proteins contained a total of 2413 mutations of which 1935 (80%) appeared in secondary structures. Differences in mutation patterns between secondary structures and whole proteins were generally not statistically significant whereas within the secondary structural elements numerous highly significant features were observed.</p> <p>Conclusion</p> <p>Numerous trends in mutated and mutant amino acids are apparent. Among the original residues, arginine clearly has the highest relative mutability. The overall relative mutability among mutant residues is highest for cysteine and tryptophan. The mutability values are higher for mutated residues than for mutant residues. Arginine and glycine are among the most mutated residues in all secondary structures whereas the other amino acids have large variations in mutability between structure types. Statistical analysis was used to reveal trends in different secondary structural elements, residue types as well as for the charge and volume changes.</p

    Prediction of disease-related mutations affecting protein localization

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Eukaryotic cells contain numerous compartments, which have different protein constituents. Proteins are typically directed to compartments by short peptide sequences that act as targeting signals. Translocation to the proper compartment allows a protein to form the necessary interactions with its partners and take part in biological networks such as signalling and metabolic pathways. If a protein is not transported to the correct intracellular compartment either the reaction performed or information carried by the protein does not reach the proper site, causing either inactivation of central reactions or misregulation of signalling cascades, or the mislocalized active protein has harmful effects by acting in the wrong place.</p> <p>Results</p> <p>Numerous methods have been developed to predict protein subcellular localization with quite high accuracy. We applied bioinformatics methods to investigate the effects of known disease-related mutations on protein targeting and localization by analyzing over 22,000 missense mutations in more than 1,500 proteins with two complementary prediction approaches. Several hundred putative localization affecting mutations were identified and investigated statistically.</p> <p>Conclusion</p> <p>Although alterations to localization signals are rare, these effects should be taken into account when analyzing the consequences of disease-related mutations.</p

    Efficiency of the immunome protein interaction network increases during evolution

    Get PDF
    Details of the mechanisms and selection pressures that shape the emergence and development of complex biological systems, such as the human immune system, are poorly understood. A recent definition of a reference set of proteins essential for the human immunome, combined with information about protein interaction networks for these proteins, facilitates evolutionary study of this biological machinery

    Clustering of gene ontology terms in genomes.

    Get PDF
    Although protein coding genes occupy only a small fraction of genomes in higher species, they are not randomly distributed within or between chromosomes. Clustering of genes with related function(s) and/or characteristics has been evident at several different levels. To study how common the clustering of functionally related genes is and what kind of functions the end products of these genes are involved, we collected gene ontology (GO) terms for complete genomes and developed a method to detect previously undefined gene clustering. Exhaustive analysis was performed for seven widely studied species ranging from human to Escherichia coli. To overcome problems related to varying gene lengths and densities, a novel method was developed and a fixed number of genes were analyzed irrespective of the genome span covered. Statistically very significant GO term clustering was apparent in all the investigated genomes. The analysis window, which ranged from 5 to 50 consecutive genes, revealed extensive GO term clusters for genes with widely varying functions. Here, the most interesting and significant results are discussed and the complete dataset for each analyzed species is available at the GOme database at http://bioinf.uta.fi/GOme. The results indicated that clusters of genes with related functions are very common, not only in bacteria, in which operons are frequent, but also in all the studied species irrespective of how complex they are. There are some differences between species but in all of them GO term clusters are common and of widely differing sizes. The presented method can be applied to analyze any genome or part of a genome for which descriptive features are available, and thus is not restricted to ontology terms. This method can also be applied to investigate gene and protein expression patterns. The results pave a way for further studies of mechanisms that shape genome structure and evolutionary forces related to them

    ImmTree: Database of evolutionary relationships of genes and proteins in the human immune system

    Get PDF
    BACKGROUND: The immune system, which is a complex machinery, is based on the highly coordinated expression of a wide array of genes and proteins. The evolutionary history of the human immune system is not well characterised. Although several studies related to the development and evolution of immunological processes have been published, a full-scale genome-based analysis is still missing. A database focused on the evolutionary relationships of immune related genes would contribute to and facilitate research on immunology and evolutionary biology. RESULTS: An Internet resource called ImmTree was constructed for studying the evolution and evolutionary trees of the human immune system. ImmTree contains information about orthologs in 80 species collected from the HomoloGene, OrthoMCL and EGO databases. In addition to phylogenetic trees, the service provides data for the comparison of human-mouse ortholog pairs, including synonymous and non-synonymous mutation rates, Z values, and K(a)/K(s )quotients. A versatile search engine allows complex queries from the database. Currently, data is available for 847 human immune system related genes and proteins. CONCLUSION: ImmTree provides a unique data set of genes and proteins from the human immune system, their phylogenetics, and information for comparisons of human-mouse ortholog pairs, synonymous and non-synonymous mutation rates, as well as other statistical information

    Genome-wide selection of unique and valid oligonucleotides

    Get PDF
    Functional genomics methods are used to investigate the huge amount of information contained in genomes. Numerous experimental methods rely on the use of oligo- or polynucleotides. Nucleotide strand hybridization forms the underlying principle for these methods. For all these techniques, the probes should be unique for analyzed genes. In addition to being unique for the studied genes, the probes should fulfill a large number of criteria to be usable and valid. The criteria include for example, avoidance of self-annealing, suitable melting temperature and nucleotide composition. We developed a method for searching unique and valid oligonucleotides or probes for genes so that there is not even a similar (approximate) occurrence in any other location of the whole genome. By using probe size 25, we analyzed 17 complete genomes representing a wide range of both prokaryotic and eukaryotic organisms. More than 92% of all the genes in the investigated genomes contained valid oligonucleotides. Extensive statistical tests were performed to characterize the properties of unique and valid oligonucleotides. Unique and valid oligonucleotides were relatively evenly distributed in genes except for the beginning and end, which were somewhat overrepresented. The flanking regions in eukaryotes were clearly underrepresented among suitable oligonucleotides. In addition to distributions within genes, the effects on codon and amino acid usage were also studied
    corecore